Deep Computer Vision
DL is used in the domain of digital image processing to solve difficult problems (e.g.image colorization, classification, segmentation and detection). DL methods such as CNNs mostly improve prediction performance using big data and plentiful computing resources and have pushed the boundaries of what was possible. Problems which were assumed to be unsolvable are now solved with super-human accuracy (eg image classification). Since being reignited by Krizhevsky, Sutskever and Hinton in 2012, DL has dominated the domain ever since due to a substantially better performance compared to traditional methods.
See:
Resources
- https://github.com/kjw0612/awesome-deep-vision
- https://github.com/timzhang642/3D-Machine-Learning
- https://medium.com/@taposhdr/medical-image-analysis-with-deep-learning-i-23d518abf531
- http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/
Applications
See:
- AI/Computer Vision/Background subtraction
- AI/Computer Vision/Image and video captioning
- AI/Computer Vision/Image-to-image translation
- AI/Computer Vision/Inpainting and restoration
- AI/Computer Vision/Object classification, image recognition
- AI/Computer Vision/Object detection
- AI/Computer Vision/Semantic segmentation
- AI/Computer Vision/Super-resolution
- AI/Computer Vision/Video Frame Interpolation
- AI/Computer Vision/Video segmentation and prediction
Code
- #CODE Vision - The torchvision package consists of popular datasets, model architectures, and common image transformations fo CV
- #CODE Scenic - A Jax Library for Computer Vision Research and Beyond
- https://www.marktechpost.com/2021/10/30/google-research-introduces-scenic-an-open-source-jax-library-for-computer-vision-research/
- codebase with a focus on research around attention-based models for computer vision
- #PAPER SCENIC: A JAX Library for Computer Vision Research and Beyond (2021)
- #CODE Pytorch-image-models
- PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more
- https://rwightman.github.io/pytorch-image-models/
- #CODE Imgaug. Image augmentation for machine learning experiments
- #CODE Openface. Free and open source face recognition with deep neural networks
References
- #PAPER Video Pixel Networks (Kalchbrenner 2016)
- #PAPER Pixel RNNs - Pixel Recurrent Neural Networks (van den Oord 2016)
- Pixel-RNN presents a novel architecture with recurrent layers and residual connections that predicts pixels across the vertical and horizontal axes. The architecture models the joint distribution of pixels as a product of conditional distributions of horizontal and diagonal pixels. The model achieves state-of-the-art in the generation of natural images.
- https://medium.com/a-paper-a-day-will-have-you-screaming-hurray/day-4-pixel-recurrent-neural-networks-1b3201d8932d
- https://christineai.blog/pixelcnn-and-pixelrnn/
- #PAPER Conditional Image Generation with PixelCNN Decoders (van den Oord 2016)
- #PAPER PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications (Salimans 2017)
- #PAPER #REVIEW Deep Learning for Computer Vision: A Brief Review (Voulodimos 2017)
- #PAPER FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models (Grathwohl 2018)
- #PAPER Generating Realistic Geology Conditioned on Physical Measurements with Generative Adversarial Networks (Dupont 2018)
- Using G and D we want to generate realistic images conditioned on a set of known pixels
- Total loss is a combination of a Prior loss (high score of generated images from D) and a Context loss (generated image should match the known pxs)
- For the Context loss, a mask is used with smoothing
- #PAPER Deep Learning vs. Traditional Computer Vision (O'Mahony 2019)
- #PAPER Parametric generation of conditional geological realizations using generative neural networks (Chan 2019)
- #PAPER Parametrization of Stochastic Inputs Using Generative Adversarial Networks With Application in Geology (Chan 2020)
- #PAPER Deep learning encodes robust discriminative neuroimaging representations to outperform standard machine learning (Abrol 2021)
- #PAPER #REVIEW Deep learning-enabled medical computer vision (Esteva 2021)
- #PAPER Generative Models as Distributions of Functions (Dupont 2021)
- Generative models are typically trained on grid-like data such as images (tied to the underlying grid resolution)
- Instead of discretized grids, they parametrized individual data points by continuous functions over which they learned distributions --> generative models
- Coordinate and feature pairs are treated as point clouds (sets with underlying notion of distance). Leveraged the PointConv framekwork
- Their model can learn rich distributions of functions independently of data type and resolution.
- #PAPER Diverse Generation from a Single Video Made Possible (Haim 2021)
- #PAPER Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator (Li 2021)
- #CODE https://github.com/d-li14/involution
- #CODE https://github.com/PrivateMaRyan/keras-involution2Ds
- Paper explained
- https://keras.io/examples/vision/involution/
- Involution: Inverting the Inherence of Convolution for Visual Recognition
- involution is a general-purpose neural primitive that is versatile for a spectrum of deep learning models on different vision tasks
- involution bridges convolution and self-attention in design, while being more efficient and effective than convolution, simpler than self-attention in form
- the proposed involution operator could be leveraged as fundamental bricks to build the new generation of neural networks for visual recognition, powering different deep learning models on several prevalent benchmarks
- #PAPER Unifying Nonlocal Blocks for Neural Networks (Zhu 2021)
- #PAPER X-volution: On the unification of convolution and self-attention (Chen 2021)
- #PAPER Bivolution: A Static and Dynamic Coupled Filter (Hu 2022)
- #PAPER Convolution of Convolution: Let Kernels Spatially Collaborate (Zhao 2022)
- #PAPER Scaling Autoregressive Models for Content-Rich Text-to-Image Generation (Yu 2022)
- #PAPER Autoregressive Image Generation using Residual Quantization (Lee 2022)
- #PAPER MultiMAE: Multi-modal Multi-task Masked Autoencoders (Bachman 2022)
- #PAPER InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions (Wang 2022)